Download the app
← Latest news

RecursiveMAS cuts multi-agent latency by using embeddings only, delivering 2.4x speed and 75% fewer tokens

Technology
Published on 16 May 2026
RecursiveMAS cuts multi-agent latency by using embeddings only, delivering 2.4x speed and 75% fewer tokens

Agents “talk” through embeddings, not intermediate text—fast and cheaper

Researchers from UIUC and Stanford propose RecursiveMAS, a multi-agent framework that replaces text-to-text communication with latent embedding passing. Instead of generating reasoning tokens at every step, agents loop continuous representations through RecursiveLink modules and only output text at the end. Tests across nine benchmarks show up to 2.4x faster inference, 75% token reduction by round three, and an 8.3% accuracy gain, with far cheaper training than full fine-tuning.

  • Average accuracy rises 8.3% over the strongest baselines
  • End-to-end inference speeds up 1.2x to 2.4x by avoiding stepwise text
  • Token usage drops 75.6% by recursion round three versus Recursive-TextMAS
  • Training updates only ~13M RecursiveLink parameters (about 0.31% of trainable size)
Read the full story at Venture Beat

This summarization was done by Beige for a story published on Venture BeatVenture Beat

The full experience is on mobile.

Swipe through stories, personalise your feed, and save articles for later — all on the app.